Never before in history, have there been so many people on Earth as right now. The number boosted in the years, from around 1 billion in the year 1800, to 7.5 billions in 2017.
Estimates of the population amount at earlier times have been done too: at the time agriculture emerged in around 10000 Before Christ, estimates of the world population ranged between 1 million and 15 million. Even earlier - about 70000 years ago - studies supports that humans may have gone through bottleneck of 1000 - 10000 people according to the thory of the Toba supervulcanic eruption\(^{[1]}\).
Given the population growth of the last century, what should we expect for the next one? Will this lead to major changes in our lifestyle, or will this lead to wars, poverty problems, lack of primary resources and so on?
Or maybe all those are just unwarrant fears and everything is going to fix itself?
For this study I joined various dataset. I started from the dataset “Countries of the world” that you can find on Kaggle at https://www.kaggle.com/fernandol/countries-of-the-world, which contains free Data from the World Factbook.
To analyze the situation in Italy also in earlier years (1700 - 1960) I added the data found here: http://www.populstat.info/Europe/italyc.htm
To have an estimate of the world’s population from year one AD, I took the data from here: www.ecology.com/population-estimates-year-2050/.
I decided to move forward my analysis combining those datasets to some additive ones I found on the World Bank Open Data at https://data.worldbank.org/, where I decided to download the collections of data regarding population amount, birthrate and deathrate (both over 1000 people) and the Gross Domestic Product of a Country; those datasets contain values about the relative indicators from 1960 to 2016 for (quite) every Country in the world, and they show some missing data.
To analyze the data, I worked making use of different R packages: dplyr, leaflet, ggplot2, tidyr are the main names, but I used also geojsonio, rworldmap and countrycode for my parsing and leaflet plots, and htmlwidget and htmltools to save or plot some interactive maps.
Let’s have first of all a look to the early ages of the human growth:
To generate this plot, I just had to read a simple two-column table. I decided to use plotly, to have an interactive view of the data.
As we can see, there are a lower and an upper estimates for those values: in those ages, the world population starts to grow growing with a trend that seems quite exponential!
I think that I can try to have a logistic interpolation here.
Let’s take a look to the datasets of World Bank Open Data: for semplicity I am gonna describe the “Total Population” dataset, the others are organized in the same way. When one decides to download an indicator from this site, ends up with three files:
Indicator Name and Code are quite useless for our scope, beacuse they are only a skimpy description of the table content. Country name and code are, on the other end, essential: each row of the table contains all the data for one single Country, for that indicator, for the 1960 - 2017 time frame. In the columns with the years as names, are then contained the actual useful data.
I started with some data-parsing: I modified the year’s columns to eliminate that ‘X’ in fromt of each year, then I eliminated rows or coluns which contained only NA values (not the ones which presented some NA’s sometimes, to be clear).
After thet, I decided then to add to each Country the “Continent” and “Region” variables (the last one indicates in which part of the Continent the Country is located), to chech trend of the selected indicators as a dependence of those. To do this I had to modify the dataset adding some column by means of dlpyr and the mutate command. This analysis can be found into the “Data_Cleaning.R” script file: I used it to parse the data and then to save the cleaned dataset, to work on them.
I proceeded then checking the data about the World’s amount of population in different years, to verify if the outcome values were consistent with the well-known effective numbers: I used again dplyr’s commands to select and count data, doing something like:
total_population %>%
select(Country, `1960`) %>%
filter(Country=="World")
As you can see, also the “World” data is a row of our dataset.
I had to be careful, so: apart from “World” also other non-Country data were inserted as rows into the dataset. Indeed, when I did my first tests, things didn’t add up!
What is the trend for the world population in the last years? I decided to plot it, but to have a better analysis I plotit next to the Percentage of growth for every year of the whole world, computed as:
\[ GrowthPercentage = \frac{P_{t}-P{t_0}}{P_{t}} \cdot 100 \]
Where \(P_{t}\) represents the population in a certain year and \(P_{t_0}\) the population at the preceeding year.
With the image here below we can have a sight at the distribution of the people around the Continents and the different Regions in 2017.
Density Indicator
Given the “total population” dataset, I created a new one containing the density of people in the different Countries, to show it then on a map: to do so I keep the “Area” measure from the “countries_world” dataset took from Kaggle and I computed the density as the number of people over the area.
Obviously, I had to created a discrete scale of values to make a map work.
Leaflet
Let’s now have a look the the World’s situation. Here below you can see a leaflet representation of the distribution of people in the whole world: I created then the leaflet interactive map using the code you can find into the “leaflet_map.R” script. For it, I had to perform other data arrangements because some Country’s names where not exact for leaflet, so it did not show some data at all: for example, instead of “United States” it was expected “United States of America”, so I had to check manually for a lot of names.
Gross domestic product
The GDP is defined as “an aggregate measure of production equal to the sum of the gross values added of all resident and institutional units engaged in production (plus any taxes, and minus any subsidies, on products not included in the value of their outputs).” And is considered the “world’s most powerful statistical indicator of national development and progress”.
What happens when a poor country starts to walks throught welfare and moves to an industrialized economic system? The birth rate, which is usually high in a poor country, will no longer be compensated by the high death rate, and the population starts to grow. Then at a certain point, the fear about overpopulation starts to rise.< br> In 1929 the American demographer Warren Thompson developed the theory of the Demogrephic Transition[3], whereby happens a transition from high birth and death rates to lower birth and death rates.
This theory can involve four to five stages of transition of the trend of population growth. Here’s a summary of the five steps:
Let’s have a look at the situation in Italy, for example: in the graphs below you can see the trend of births, deaths, and the total population from 1960 to 2016.
To have a better look at the situation I searched for a dataset which included also some previous years: here below it is shown the trend od total people starting from 1700.
The green shaded are of the second graph includes the same area of the green-line graph above.
Italy is currently in Stage 4 of the Demographic Transition Model: as we can see from the above graphics, we are having low birth rates and low death rates; moreover the Population Growth Rate (PGR) is low, causing the stabilization of the people amount.
\[ PGR = \frac{P(t_2) - P(t_1)}{P(t_1)(t_2 - t_1)} \]
A positive outcome of the PGR indicates that the population is increasing, while a negative one indicates the decreasing of it. Moreover, a zero result means that the quantity has not changed in the selected amount of time.
Let’s compute it on the data used above here, using as time interval the years.
As we can see from the plot, the PGR values are quite low, and in the last years (2015 - 2016) starts also to become negative. This is a good proof that Italy’s population is starting to diminish, and so that Italy is currently standing into Stage Four of the Demographic Transiiton.
Let’s try to predict what will be the World’s population in the next years. To do so, I will use the Logistic model for Population Growth, which can be described by the Pearl-Reed logistic equation:
\[ \frac{dN}{dt} = rN(1-\frac{N}{K}) \]
This formula is used to describe the self-limitations of growth of a biological population, and was first published (in a different form) in 1838 by Verhulst, who was a belgian mathematician and a statistician. Pearl and Reed popularized the equation in the twentieth century.
In the equation, N represents the number of individuals at time t, r the intrinsic growth rate and K the maximum number of individuals that the environment can support. It can be integrated, obtaining:
\[ N(t) = \frac{K N_0 e^{-rt}}{K + N_0(e^{-rt}-1)} \] Where \(N_0\) is the starting situation.
The main feature of the logistic model is that it takes the shape of a sigmoid curve and describes the growth of a population as an exponential followed by aa growth decrease, and bounded by the carrying capacity of the environment.
I used then the World Population data between 1960 and 2017 to try out this model and predict how much will the population be in 2100 (the code is inside the script prospects.R) ending with this graph:
[1] https://en.wikipedia.org/wiki/Toba_catastrophe_theory
[2] https://www.kaggle.com/fernandol/countries-of-the-world
[3] https://data.worldbank.org/
[4] https://www.ecology.com/population-estimates-year-2050/ (early ages)
[5] https://en.wikipedia.org/wiki/Demographic_transition
[6] https://en.wikipedia.org/wiki/Logistic_function
[7] http://www.clker.com/clipart-530947.html (clipart)